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Abstract 

Mining  the  answer  of  a  natural  lan¬ 
guage  open-domain  question  in  a  large 
eolleetion  of  on-line  doeuments  is  made 
possible  by  the  reeognition  of  the  ex¬ 
pected  answer  type  in  relevant  text  pas¬ 
sages.  If  the  teehnology  of  retriev¬ 
ing  texts  where  the  answer  might  be 
found  is  well  developed,  few  studies 
have  been  devoted  to  the  reeognition  of 
the  answer  type. 

This  paper  presents  a  unified  model  of 
answer  types  for  open-domain  Ques¬ 
tion/Answering  that  enables  the  diseov- 
ery  of  exaet  answers.  The  evaluation 
of  the  model,  performed  on  real-world 
questions,  eonsiders  both  the  eorreet- 
ness  and  the  eoverage  of  the  answer 
types  as  well  as  their  eontribution  to  an¬ 
swer  preeision. 

1  Introduction 

Answer  mining,  a.k.a.  textual  Ques¬ 

tion/Answering  (Q/A),  represents  the  task  of 
diseovering  the  answer  to  an  open-domain  nat¬ 
ural  language  question  in  large  text  eolleetions. 
Answer  mining  beeame  a  topie  of  signitieant 
reeent  interest,  partly  due  to  the  popularity  of  In¬ 
ternet  Q/A  serviees  like  AskJeeves  and  partly  due 
to  the  reeent  evaluations  of  domain-independent 
Q/A  systems  organized  in  the  eontext  of  the 
Text  REtrieval  Conferenee  (TREC)^  The  TREC 

'The  Text  REtrieval  Conference  (TREC)  is  a  series  of 


evaluations  of  fully  automatie  Q/A  systems 
speeified  two  restrietions:  (1)  there  is  at  least 
one  doeument  in  the  test  eolleetion  that  eontains 
the  answer  to  a  test  question;  and  (2)  the  answer 
length  is  either  50  eontiguous  bytes  (short  an¬ 
swers)  or  250  eontiguous  bytes  (long  answers). 
These  two  requirements  intentionally  simplify 
the  answer  mining  task,  sinee  the  identifieation 
of  the  exact  answer  is  left  to  the  user.  However, 
given  that  the  expeeted  information  is  reeognized 
by  inspeeting  text  snippets  of  relatively  small 
size,  the  TREC  Q/A  task  took  a  step  eloser 
to  information  retrieval  rather  than  document 
retrieval.  Moreover,  the  teehniques  developed 
to  extraet  text  snippets  where  the  answers  might 
lie  paved  the  way  to  a  unified  model  for  answer 
mining. 

To  find  the  answer  to  a  question  several  steps 
must  be  taken,  as  reported  in  (Abney  et  al.,  2000) 
(Moldovan  et  al.,  2000)  (Srihari  and  Ei,  2000): 

•  Eirst,  the  question  semanties  needs  to  be  eap- 
tured.  This  translates  into  identifying  (i)  the 
expected  answer  type  and  (ii)  the  question 
keywords  that  ean  be  used  to  retrieve  text 
passages  where  the  answer  may  be  found. 

•  Seeondly,  the  index  of  the  doeument  eollee¬ 
tion  must  be  used  to  identify  the  text  pas¬ 
sages  of  interest.  The  retrieval  method  either 
employs  speeial  operators  or  simply  modi¬ 
fies  boolean  or  veetor  retrieval.  Sinee  the 
expeeted  answer  type  is  known  at  the  time 

workshops  organized  by  the  National  Institute  of  Standards 
and  Technology  (NIST),  designed  to  advance  the  state-of- 
the-art  in  information  retrieval  (IR) 
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of  the  retrieval,  the  quality  of  the  text  pas¬ 
sages  is  greatly  improved  by  filtering  out 
those  passages  where  concepts  of  the  same 
category  as  the  answer  type  are  not  present. 

•  Thirdly,  answer  extraction  takes  place  by 
combining  several  features  that  take  into  ac¬ 
count  the  expected  answer  type. 

Since  the  expected  answer  type  is  the  only  in¬ 
formation  used  in  all  the  phases  of  textual  Q/A, 
its  recognition  and  usage  is  central  to  the  perfor¬ 
mance  of  answer  mining. 

For  an  open-domain  Q/A  system,  establishing 
the  possible  answer  types  is  a  challenging  prob¬ 
lem.  Currently,  most  of  the  systems  recognize  the 
answer  type  by  associating  the  question  stem  (e.g. 
What,  Who,  Why  or  How)  and  one  of  the  concepts 
from  the  question  to  a  predefined  general  cate¬ 
gory,  such  as  Person,  Organization,  Loca¬ 
tion,  Time,  Date,  Money  or  Number.  Since 
many  of  fhese  cafegories  are  represented  in  lexis 
as  named  enlilies,  Iheir  recognition  as  possible 
answers  is  enabled  by  slale-of-lhe-arl  Named  En¬ 
tity  (NE)  recognizers,  devised  fo  work  wilh  high 
precision  in  Information  Exlraclion  (IE)  lasks.  To 
allow  for  NE-supporled  answer  mining,  a  large 
number  of  semantic  categories  corresponding  lo 
various  names  musl  be  considered,  e.g.  names  of 
cars,  names  of  diseases,  names  of  dishes,  names 
of  boals,  efc.  Eurlhermore,  a  significanl  number 
of  entities  are  nol  unique,  Iherefore  do  nol  bear 
names,  bul  are  slill  potential  answers  fo  an  open- 
domain  queslion.  Additionally,  questions  do  nol 
focus  only  on  enlilies  and  Iheir  allribules;  Ihey 
also  ask  aboul  evenls  and  Iheir  related  entities. 

In  Ibis  paper  we  inlroduce  a  model  of  answer 
types  lhal  accounls  for  answers  lo  questions  of 
various  complexity.  The  model  enables  several 
differenl  formals  of  Ihe  exact  answer  lo  open- 
domain  questions  and  considers  also  Ihe  silualion 
when  Ihe  answer  is  produced  from  a  number  of 
differenl  documenl  sources.  We  define  formally 
Ihe  answer  types  lo  open-domain  questions  and 
extend  Ihe  recognition  of  answer  types  beyond  Ihe 
question  processing  phase,  Ihus  enabling  several 
feed-back  mechanisms  derived  from  Ihe  process¬ 
ing  of  documenls  and  answers. 

The  main  conlribulion  of  Ihe  paper  is  in  provid¬ 
ing  a  unified  model  of  answer  mining  from  large 


collections  of  on-line  documenls  lhal  accounls  for 
Ihe  processing  of  open-domain  nalural  language 
questions  of  varied  complexity.  The  hope  is  lhal  a 
coherenl  model  of  Ihe  lexlual  answer  discovery 
could  help  developing  belter  lexl  mining  melh- 
ods,  capable  of  acquiring  and  rapidly  prototyping 
knowledge  from  Ihe  vasl  amounl  of  on-line  lexis. 
Additionally,  such  a  model  enables  Ihe  develop- 
menl  of  inlelligenl  conversational  agenls  lhal  op¬ 
erate  on  open-domain  lasks. 

We  firsl  presenl  a  background  of  Q/A  systems 
and  Ihen  define  several  classes  of  question  com¬ 
plexity.  In  Section  3  we  presenl  Ihe  formal  answer 
type  model  whereas  in  Section  4  we  show  how  to 
recognize  Ihe  answer  type  of  open-domain  ques¬ 
tions  and  use  il  to  mine  Ihe  answer.  Section  5 
presenls  Ihe  evaluation  of  Ihe  model  and  summa¬ 
rizes  Ihe  conclusions. 

2  Background 

Open-Domain  Question/Answering 

To  search  in  a  large  collection  of  on-line  docu¬ 
menls  for  Ihe  answer  to  a  nalural  language  ques¬ 
tion  we  need  to  know  (1)  what  we  are  looking  for, 
i.e.  Ihe  expected  answer  type;  and  (2)  where  Ihe 
answer  mighl  be  located  in  Ihe  collection.  Eur¬ 
lhermore,  knowing  Ihe  answer  type  and  recog¬ 
nizing  a  lexl  passage  where  Ihe  answer  mighl  be 
found  is  nol  sufficienl  for  exlracling  Ihe  exacl  an¬ 
swer.  We  also  need  to  know  Ihe  dependencies 
belween  Ihe  answer  type  and  Ihe  olher  concepls 
from  Ihe  question  or  Ihe  answer.  Eor  example,  if 
Ihe  answer  type  of  Ihe  TREC  question 

QT:  How  many  dogs  pull  a  sled  in  the  Iditarod? 

is  known  to  be  a  number,  we  also  need  to  be  aware 
lhal  Ibis  number  musl  quantify  Ihe  dogs  harnessed 
to  a  sled  in  Ihe  Idilarod  games  and  nol  Ihe  number 
of  parlicipanls  in  Ihe  games. 

Capluring  question  or  answer  dependencies 
can  be  casl  as  a  slraighlforward  process  of 
mapping  synlaclic  frees  to  sels  of  binary  head- 
modifier  relationships,  as  firsl  noted  in  (Collins, 
1996).  Given  a  parse  free,  Ihe  head-child  of  each 
synlaclic  consliluenl  can  be  identified  based  on  a 
simple  sel  of  rules  used  to  Irain  synlaclic  parsers, 
cf.  (Collins,  1996).  Dependency  relations  are  es- 
lablished  belween  each  leaf  corresponding  to  Ihe 
head  child  and  Ihe  leaves  of  ils  consliluenl  sib- 


Question  ETl:  What  do  most  tourists  visit  in  Reims? 


(a) 


Question  Dependecies  (ETl ): 

I  * - - ^11  i 

What  most  tourists  visit  Reims 

(b) 


Figure  1 :  Example  of  TREC  test  question 

lings  that  are  not  stop  words,  as  illustrated  by  the 
mapping  of  Eigure  1(a)  into  Eigure  1(b).  Unlike  in 
IR  systems,  question  stems  are  eonsidered  eontent 
words.  When  question  dependeneies  are  known 
(Harabagiu  et  ah,  2000)  proposed  a  teehnique  of 
identifying  the  answer  type  based  on  the  semantie 
eategory  of  the  question  stem  and  eventually  of  its 
most  eonneeted  dependent  eoneept.  Eor  example, 
in  the  ease  of  question  ETl,  illustrated  in  Eigure  1, 
the  answer  type  is  determined  by  the  ambiguous 
question  stem  what  and  the  verb  visit.  The  answer 
type  is  the  objeet  of  the  verb  visit,  which  is  a  place 
of  attraction  or  entertainment,  defined  by  the  se¬ 
mantic  category  Eandmark.  The  answer  type 
replaces  the  question  stem,  generating  the  follow¬ 
ing  dependency  graph,  that  can  be  later  unified 
wifh  fhe  answer  dependency  graph: 


I  I  If  i 

LANDMARK  most  tourists  visit  Reims 

However  synfacfic  dependencies  vary  across 
quesfion  reformulations  or  equivalent  answers 
made  possible  by  fhe  producfive  nafure  of  nafural 
language.  Eor  example,  fhe  dependency  sfrucfure 
of  ETl,  a  reformulalion  of  quesfion  ETl  differs 
from  fhe  dependency  sfrucfure  of  ETl : 

Due  fo  fhe  facf  fhaf  verbs  see  and  visit  are  syn¬ 
onyms  (cf.  WordNef  (Miller,  1995))  and  pronoun 
1  can  be  read  a  possible  visitor,  fhe  dependency 


Question  ETl:  What  could  1  see  in  Reims? 

LANDMARK  I  see  Reims 


slruclures  of  ETl  and  ETl  can  be  mapped  one  info 
anolher.  The  mapping  is  produced  by  unifying  fhe 
Iwo  slruclures  when  lexical  and  semantic  allerna- 
lions  are  allowed.  Possible  lexical  alternations  are 
synonyms  or  morphological  alternations.  Seman¬ 
tic  alternations  consisl  of  hypernyms,  enlailmenls 
or  paraphrases.  The  unifying  mapping  of  ETl  and 
ETl  shows  lhal  fhe  Iwo  questions  are  equivalenl 
only  when  1  refers  to  a  visitor;  olher  readings  of 
ETl  being  possible  when  fhe  referenf  is  an  investi¬ 
gator  or  a  politician.  In  each  of  fhe  olher  readings, 
fhe  answer  lype  of  fhe  question  would  be  differ- 
enl.  The  unifying  mapping  of  ETl  and  ETl  is: 


1 

LANDMARK 

I  — >tourists 

^1 

see/visit 

Reims 

Similarly,  a  pair  of  equivalenl  answers  is  rec¬ 
ognized  when  lexical  and  semantic  allernalions  of 
fhe  concepls  are  allowed.  This  observalion  is  cru¬ 
cial  for  answer  mining  because: 

1 .  if  eslablishes  fhe  dependency  relations  as  fhe 
basic  processing  level  for  Q/A;  and 

2.  if  defines  fhe  search  space  based  on  allerna¬ 
lions  of  fhe  quesfion  and  answer  concepls. 

Consequenfly,  lexical  and  semanfic  alternations 
are  incorporated  as  feedback  loops  in  fhe  archilec- 
lure  of  open-domain  Q/A  systems,  as  illuslrafed 
in  Eigure  2. 

To  locate  answers,  lexl  passages  are  relrieved 
based  on  keywords  assembled  from  fhe  question 
dependency  sfrucfure.  Al  fhe  time  of  fhe  query,  if 
is  unknown  which  keywords  can  be  unified  wifh 
answer  dependencies.  However,  fhe  relevance  of 
fhe  query  is  determined  by  fhe  number  of  resull- 
ing  passages.  If  loo  many  passages  are  gener- 
aled,  fhe  query  was  loo  broad,  Ihus  is  needs  a 
specialization  by  adding  a  new  keyword.  If  loo 
few  passages  were  relrieved  fhe  query  was  loo 
specific,  Ihus  one  keyword  needs  lo  be  dropped. 
The  relevance  feedback  based  on  fhe  number  of 
relrieved  passages  ends  when  no  more  keywords 
can  be  added  or  dropped.  Afler  Ihis,  fhe  unifi- 
calions  of  fhe  quesfion  and  answer  dependencies 


Question  On-line  Documents  Answer 


Figure  2:  A  diagram  of  the  feedbaeks  supporting  Open-Domain  Q/A 


is  produeed  and  the  lexieal  alternations  imposed 
by  unifieations  are  added  to  the  list  of  keywords, 
making  possible  the  retrieval  of  new,  unseen  text 
passages,  as  illustrated  in  Figure  2. 

The  unifieation  of  dependeney  struetures  al¬ 
lows  erroneous  answers  when  the  resulting  map¬ 
ping  is  a  sparse  graph.  To  justify  the  eorreet- 
ness  of  the  answer  an  abduetive  proof  baekehain- 
ing  from  the  answer  to  the  question  must  be 
produeed.  Sueh  abduetive  meehanisms  are  de¬ 
tailed  in  (Harabagiu  et  ah,  2000).  Moreover,  the 
proof  relies  on  lexieo-semantie  knowledge  avail¬ 
able  from  WordNet  as  well  as  rapidly  formated 
knowledge  bases  generated  by  meehanisms  de- 
seribed  in  (Chaudri  et  ah,  2000).  The  justifieation 
proeess  brings  forward  semantie  alternations  that 
are  added  to  the  list  of  keywords,  the  feedbaek 
destination  of  all  loops  represented  in  Figure  2. 

Mining  the  exaet  answer  does  not  always  end 
after  extraeting  the  answer  type  from  a  eorreet 
text  snippet  beeause  often  they  result  only  in  par¬ 
tial  answers  that  need  to  be  fused  together.  The 
fusion  meehanisms  are  dietated  by  the  answer 
type. 

Question  Complexity 

Open-Domain  natural  language  questions  ean 
also  be  of  different  eomplexity  levels.  Gener¬ 
ally,  the  test  questions  used  in  the  TREC  evalu¬ 
ations  were  qualified  a.s  fact-based  questions  (ef. 
(Voorhees  and  Tiee,  2000))  as  they  mainly  were 
short  inquiries  about  attributes  or  definitions  of 
some  entity  or  event.  Table  1  lists  a  sample  of 
TREC  test  questions. 

The  TREC  test  set  did  not  inelude  any  question 


Where  is  Romania  located!  Europe 
Who  wrote  ’’Dubliners”?  James  Joyce 
What  is  the  wingspan  of  a  condor?  9  feet 
What  is  the  population  of  Japan?  120  million 
What  king  signed  the  Magna  Carta?  King  John 
Name  a  flying  mammal,  bat 

Table  1 :  TREC  test  questions  and  their  exaet  an¬ 
swers  (boldfaeed) 

that  ean  be  modeled  as  Information  Extraetion 
(IE)  task.  Typieally,  IE  templates  model  queries 
regarding  who  did  an  event  of  interest,  what  was 
produeed  by  that  event,  when  and  where  and  even¬ 
tually  why.  The  event  of  interest  is  a  eomplex 
event,  like  terrorism  in  Eatin  Ameriea,  joint  ven¬ 
tures  or  management  sueeessions.  An  example  of 
template-modeled  question  is: 

What  management  successions  occurred  at 
IBM  in  1999? 

In  addition,  questions  may  also  ask  about  de¬ 
velopments  of  events  or  trends  that  are  usually 
answered  by  a  text  summary.  Sinee  data  produe- 
ing  these  summaries  ean  be  soureed  in  different 
doeuments,  summary  fusion  teehniques  as  pro¬ 
posed  in  (Radev  and  MeKeown,  1998)  ean  be  em¬ 
ployed.  Template-based  questions  and  summary¬ 
asking  inquiries  eover  most  of  the  elasses  of  ques¬ 
tion  eomplexity  proposed  in  (Moldovan  et  ah, 
2000).  Although  the  topie  of  natural  language 
open-domain  question  eomplexity  needs  further 
study,  we  eonsider  herein  the  following  elasses  of 
questions: 

•  Class  1:  Questions  inquiring  about  entities, 
events,  entity  attributes  (ineluding  number). 


event  themes,  event  manners,  event  eondi- 
tions  and  event  eonsequenees. 

•  Class  2:  Questions  modeled  by  templates, 
ineluding  questions  that  foeus  only  on  one 
of  the  template  slots  (e.g.  “What  managers 
were  promoted  last  year  at  Microsoft?”). 

•  Class  3:  Questions  asking  for  a  sum¬ 
mary  that  is  produeed  by  fusing  template- 
based  information  from  different  sourees 
(e.g.  “What  happened  after  the  Titanic 
sunk?”). 

Sinee  (Radev  and  MeKeown,  1998)  deseribes  the 
summary  fusion  meehanisms.  Class  3  of  ques¬ 
tions  ean  be  redueed  in  this  paper  to  Class  2, 
whieh  deals  with  the  proeessing  of  the  template. 

3  A  Model  of  Answer  Types 

This  seetion  deseribes  a  knowledge-based  model 
of  open-domain  natural  language  answer  types 
(ATs).  In  partieular  we  formally  define  the  an¬ 
swer  type  through  a  quadruple 
AT  =  [Category,  Dependency,  Number, 
Format]. 

The  Category  is  defined  as  one  of  the  following 
possibilities: 

1.  one  of  the  tops  of  a  predefined  Answer 
Taxonomy  or  one  of  ifs  nodes; 

2.  Deeinition; 

3.  Template;  or 

4.  Summary. 

For  experf  Q/A  sysfems,  fhis  lisf  of  eafegories  ean 
be  exfended.  The  Dependency  is  defined  as  fhe 
quesfion  dependeney  sfruefure  when  fhe  CATE¬ 
GORY  belongs  fo  fhe  Answer  Taxonomy  or  is 
a  Deeinition.  Olherwise  if  is  a  femplafe  aufo- 
mafieally  generated.  The  Number  is  a  flag  indi- 
eafing  whefher  fhe  answer  should  eonfain  a  single 
dafum  or  a  lisf  of  elemenfs.  The  FORMAT  defines 
fhe  fexf  span  of  fhe  exaef  answer.  For  example,  if 
fhe  Category  is  DIMENSION,  fhe  Format  is 
<Number><Measuring  Unit>. 

The  Answer  Taxonomy  was  ereafed  in 
fhree  sfeps: 

Step  1  We  devise  a  sef  of  fop  eafegories  modeled 


afler  fhe  semanfie  domains  eneoded  in  fhe  Word- 
Nef  dafabase,  whieh  eonfains  25  noun  eafegories 
and  15  verb  eafegories.  The  fop  of  eaeh  WordNef 
hierarehy  eorresponding  fo  every  semanfie  eafe- 
gory  was  manually  inspeefed  fo  seleef  fhe  mosf 
represenfafive  nodes  and  add  fhem  fo  fhe  fops  of 
he  Answer  Taxonomy.  Furfhermore  we  have 
added  open  semanfie  eafegories  eorresponding  fo 
named  entities.  For  example  Table  2  lisfs  fhe 
named  entity  eafegories  we  have  eonsidered  in 
our  experimenfs.  Many  of  fhe  fops  of  fhe  An¬ 
swer  Taxonomy  are  furfher  eafegorized,  as  il- 
lusfrafed  in  Figure  3.  In  fofal,  we  have  eonsidered 
33  eoneepfs  as  lops  of  fhe  laxonomy. 


Numerical  Value  Location 


Figure  3:  Two  examples  of  lop  answer  hierar- 
ehies. 

Step  2  The  additional  ealegorizalion  of  fhe  lop 
Answer  Taxonomy  generates  a  many-lo- 
many  mapping  of  fhe  Named  Enlily  eafegories  in 
fhe  lops  of  fhe  Answer  Taxonomy.  Figure  4 
illuslrales  some  of  fhe  mappings. 


date 

time 

organization 

city 

product 

price 

country 

money 

human 

disease 

phone  number 

continent 

percent 

province 

other  location 

plant 

mammal 

alphabet 

airport  code 

game 

bird 

reptile 

university 

dog  breed 

number 

quantity 

landmark 

dish 

Table  2:  Named  Entity  Calegories. 


money 

price 

quantity 

number 


Eigure  4:  Mappings  of  answer  types  in  named  en¬ 
lily  eafegories. 


Step  3:  Eaeh  leaf  from  fhe  lop  of  fhe  Answer 
Taxonomy  is  eonneeled  fo  one  or  several  Word- 


Nationality 


Numerical  Value 


Location 


Degree  Temperature  Duration  Count  Speed  Dimension 


How  hot  does  the  inside  What  is  the  duration  What  is  the  wingspan  How  Mg  is  our  galaxy 
of  an  active  volcano  get?  of  the  trip  from...?  of  a  condor?  in  diameter? 


Figure  5:  Fragment  of  the  Answer  Taxonomy. 


Net  subherarehies.  Figure  5  illustrates  a  fragment 
of  the  Answer  Taxonomy  eomprising  several 
WordNet  subhierarehies. 

4  Answer  Recognition  and  Extraction 

In  this  section  we  show  how,  given  a  question  and 
its  dependency  structure,  we  can  recognize  its  an¬ 
swer  type  and  consequently  extract  the  exact  an¬ 
swer.  Here  we  describe  four  representative  cases. 
Case  1:  The  CATEGORY  of  the  answer  type  is 
Definition  when  the  question  can  be  matched 
by  one  of  the  following  patterns: 


(Q-Pl):Wluit  {is\are}  <phraseJojlejine>l 
(Q-P2):H%flf  is  the  definition  of  KphmseJojlefineyl 
(Q-P3):W%o  {is\was\are\were}  <personjiame(s)>l 

The  format  of  the  Definition  answers  is  sim¬ 
ilarly  dependent  on  a  set  of  patterns,  determined 
as  the  head  of  the  <Answer^hrase>: 


(A-Ply.[<phraseJojdefine>  {w|are}]  <Answer^hrase> 
{A-P2)\[<phraseJojdefine>,  {a\the\an  <Answer^hrase>}] 
(A-P3):[<phraseJ0udefine>  -]  <Answer^hrase> 


Case  2:  The  dependency  structure  of  the  ques¬ 
tion  indicates  that  a  special  instance  of  a  concept 
is  sought.  The  cues  are  given  either  by  the  pres¬ 
ence  of  words  kind,  type,  name  or  by  the  ques¬ 
tion  stems  what  or  which  connected  to  the  object 
of  a  verb.  Table  3  lists  a  set  of  such  questions 
and  their  corresponding  answers.  In  this  case  the 


answer  type  is  given  by  the  subhierarchy  defined 
by  the  node  from  the  dependency  structure  whose 
adjunct  is  either  kind,  type,  name  or  the  question 
stem.  In  this  situation  the  CATEGORY  does  not 
belong  to  the  top  of  the  Answer  Taxonomy, 
but  it  is  rather  dynamically  created  by  the  inter¬ 
pretation  of  the  dependency  graph. 

For  example,  the  dynamic  CATEGORY  bridge, 
generated  for  Q204  from  Table  3,  contains  14 
member  instances,  including  viaduct,  rope  bridge 
and  suspension  bridge.  Similarly,  question  Q581 
generates  a  dynamic  CATEGORY yfower,  with  470 
member  instances,  comprising  orchid,  petunia 
and  sunflower.  For  dynamic  categories  all  mem¬ 
ber  instances  are  searched  in  the  retrieved  pas¬ 
sages  during  answer  extraction  to  detect  candidate 
answers. 

Case  3:  In  all  other  cases,  the  concept  related 
to  the  question  stem  in  the  question  dependency 
graph  is  searched  through  the  Answer  Taxon¬ 
omy,  returning  the  answer  type  as  the  top  of  it 
hierarchy.  Figure  5  illustrates  several  questions 
and  their  answer  type  CATEGORY. 

Case  4:  Whenever  the  semantic  dependencies  of 
several  correct  answers  can  be  mapped  one  into 
another,  we  change  the  CATEGORY  of  the  answer 
type  into  Template.  The  slots  of  the  actual  tem¬ 
plate  are  determined  by  a  three  step  procedure, 
that  we  illustrate  with  a  walk-through  example 
corresponding  to  the  question  What  management 
successions  occurred  at  IBM  in  1999?: 

Step  1:  For  each  pair  of  extracted  candidate 


Q204:  What  type  of  bridge  is  the  Golden  Gate  Bridge? 

Answer:  the  Seto  Ohashi  Bridge,  consisting  of  six  suspension  bridges  in  the  style  of  Golden  Gate  Bridge. 
Q267:  What  is  the  name  for  clouds  that  produce  rain? 

Answer:  Acid  rain  in  Cheju  Island  and  the  Taean  peninsula  is  carried  by  rain  clouds  from  China. 

Q503:  What  kind  of  sports  team  is  the  Buffalo  Sabres? 

Answer:  Alexander  Mogilny  hopes  to  continue  his  hockey  career  with  the  NHL’s  Buffalo  Sabres. 

Q581:  What  flower  did  Vincent  Van  Gogh  paint? 

Answer:  In  March  1987,  van  Gogh’s  “Sunflowers”  sold  for  $39.9  million  at  Christie’s  in  London 


Table  3:  TREC  test  questions  and  their  answers.  The  exact  answer  is  emphasized. 


Person  1  replace/succeed  Person2  Organization  Position 

t  I  I  i  ^ 

Organization  nominate/assign  Person  Position 

t  I  I  i  ^ 

Person  resign/leave  Position  Organization 

(a) 


Position  Personl  Person2  Organization 

(b) 

Figure  6:  Dependencies  that  generate  templates. 

answers  unify  the  dependency  graphs  and  find 
common  generalizations  whenever  possible.  Fig¬ 
ure  6(a)  illustrates  some  of  the  mappings. 

Step  2:  Identify  across  mappings  the  common 
categories  and  the  trigger-words  that  were  used 
as  keywords.  In  Figure  6(a)  the  trigger  words  are 
boldfaced. 

Step  3:  Collect  all  common  categories  in  a  tem¬ 
plate  and  use  their  names  as  slots.  Figure  6(b) 
illustrates  the  resulting  template. 

This  procedure  is  a  reverse-engineering  of  the 
mechanisms  used  generally  in  Information  Ex¬ 
traction  (IE),  where  given  a  template,  linguistic 
patterns  are  acquired  to  identify  the  text  frag¬ 
ments  having  relevant  information.  In  the  case 
of  answer  mining,  the  relevant  text  passages  are 
known.  The  dependency  graphs  help  finding  fhe 
linguisfic  rules  and  are  generalized  in  a  template. 

To  be  able  fo  generafe  fhe  femplafe  we  also 
need  fo  have  a  way  of  exfracfing  fhe  fexf  where 
fhe  answer  dependencies  are  defecfed.  For  fhis 
purpose  we  have  designed  a  mefhod  fhaf  em¬ 
ploys  a  simple  machine  learning  mechanism:  fhe 
percepfron.  For  each  fexf  passage  refrieved  by 
fhe  keyword-based  query  we  define  fhe  following 


seven  feafures: 

o  r els p  the  number  of  quesfion  words  mafched  in 
fhe  same  phrase  as  fhe  answer  fype  CATEGORY; 
o  relss  the  number  of  question  words  matched  in 
the  same  sentence  as  the  answer  type  CATEGORY; 
o  relpp:  a  flag  sef  fo  1  if  fhe  answer  type  CATE¬ 
GORY  is  followed  by  a  puncfuafion  sign,  and  sef 
fo  0  ofherwise; 

o  relocTW-  the  number  of  quesfion  words 
mafches  separated  from  fhe  answer  fype  CATE¬ 
GORY  by  af  mosf  fhree  words  and  one  comma; 
o  relsws-  the  number  of  quesfion  words  occur¬ 
ring  in  fhe  same  order  in  fhe  answer  fexf  as  in  fhe 
quesfion; 

o  relpTw-  the  average  disfance  from  fhe  an¬ 
swer  fype  Category  fo  any  of  fhe  quesfion  word 
mafches; 

o  r  el  PI  MW'-  the  number  of  quesfion  words 
mafched  in  fhe  answer  fexf. 

To  frain  fhe  percepfron  we  annofafed  fhe  correcf 
answers  of  200  of  fhe  TREC  fesf  questions.  Given 
a  pair  of  answers,  in  which  one  of  fhe  answers  is 
correcf,  we  compufe  a  relative  comparison  score 
using  fhe  formula: 

relpair  =  wsws  X  Arelsws  +  wfp  x  Arelpp 

+WOCTW  X  ArelocTW  +  wsp  x  Arelsp 
+WSS  X  Arelss  +  wnmw  x  ArelNMW 
+WDTW  X  AreloTW  +  threshold 

The  percepfron  learns  fhe  seven  weighfs  as  well 
as  fhe  value  of  fhe  fhreshold  used  for  fulure  fesfs 
on  fhe  remaining  693  TREC  questions.  Whenever 
fhe  relative  score  is  larger  fhan  fhe  fhreshold,  a 
passage  is  exfracfed  as  a  candidafe  answer.  In  our 
experimenfs,  fhe  performance  of  fhe  percepfron 
surpassed  fhe  performance  of  decision  frees  for 
answer  exfracfion. 


5  Evaluations  and  Conclusion 

To  evaluate  our  answer  type  model  we  used  693 
TREC  test  questions  on  whieh  we  did  not  train  the 
pereeptron.  Table  4  lists  the  breakdown  of  the  an¬ 
swer  type  Categories  reeognized  by  our  model 
as  well  as  the  eoverage  and  preeision  of  the  reeog- 
nition.  Currently  our  Answer  Taxonomy  en¬ 
codes  8707  concepts  from  129  WordNet  hierar¬ 
chies,  covering  only  81%  of  the  expected  answer 
types.  This  shows  that  we  have  to  continue  en¬ 
coding  more  top  concepts  in  the  taxonomy  and 
link  them  to  more  WordNet  concepts. 

The  recognition  mechanism  had  better  preci¬ 
sion  than  coverage  in  our  experiments.  Moreover 
a  relationship  between  the  coverage  of  answer 
type  recognition  and  the  overall  performance  of 
answer  mining,  as  illustrated  in  Table  4.  Some  of 
the  test  questions  are  listed  in  Tables  1  and  3.  The 
experiments  were  conducted  by  using  736,794 
on-line  documents  from  Los  Angeles  Times,  For¬ 
eign  Broadcast  Information  Service,  Financial 
Times  AP  Newswire,  Wall  Street  Journal  and  San 
Jose  Mercury  News. 


Category  (#  Questions) 

Precision 

Coverage 

Definition  (64) 

91% 

84% 

Top  Answer 

Taxonomy  (439) 

79% 

74% 

Dynamic  answer 
category  (17) 

86% 

79% 

Template  (14) 

93% 

65% 

#  Answer 
Taxonomy 
Tops 

Answer  Type 
Coverage 

Q/A  Precision 

8 

44% 

42% 

22 

56% 

55% 

33 

83% 

78% 

Table  4:  Evaluation  results. 

The  experiments  show  that  open-domain  natu¬ 
ral  language  questions  of  varied  degrees  of  com¬ 
plexity  can  be  answered  consistently  from  vast 
amounts  of  on-line  texts.  One  of  the  appli¬ 
cations  of  a  unified  model  of  answer  mining 
is  the  development  of  intelligent  conversational 
agents  (Harabagiu  et  ah,  2001). 
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